---
title: "HuggingFaceAPIChatGenerator"
id: huggingfaceapichatgenerator
slug: "/huggingfaceapichatgenerator"
description: "This generator enables chat completion using various Hugging Face APIs."
---

# HuggingFaceAPIChatGenerator

This generator enables chat completion using various Hugging Face APIs.

|  |  |
| --- | --- |
| **Most common position in a pipeline** | After a [`ChatPromptBuilder`](../builders/chatpromptbuilder.mdx)  |
| **Mandatory init variables** | "api_type": The type of Hugging Face API to use  <br /> <br />"api_params": A dictionary with one of the following keys:  <br /> <br />- `model`: Hugging Face model ID. Required when `api_type` is `SERVERLESS_INFERENCE_API`.**OR** - `url`: URL of the inference endpoint. Required when `api_type` is `INFERENCE_ENDPOINTS` or `TEXT_EMBEDDINGS_INFERENCE`."token": The Hugging Face API token. Can be set with `HF_API_TOKEN` or `HF_TOKEN` env var. |
| **Mandatory run variables** | “messages”: A list of [`ChatMessage`](../../concepts/data-classes/chatmessage.mdx)  objects representing the chat |
| **Output variables** | “replies”: A list of replies of the LLM to the input chat |
| **API reference** | [Generators](/reference/generators-api) |
| **GitHub link** | https://github.com/deepset-ai/haystack/blob/main/haystack/components/generators/chat/hugging_face_api.py |

## Overview

`HuggingFaceAPIChatGenerator` can be used to generate chat completions using different Hugging Face APIs:

- [Serverless Inference API (Inference Providers)](https://huggingface.co/docs/inference-providers) - free tier available
- [Paid Inference Endpoints](https://huggingface.co/inference-endpoints)
- [Self-hosted Text Generation Inference](https://github.com/huggingface/text-generation-inference)

This component's main input is a list of `ChatMessage` objects. `ChatMessage` is a data class that contains a message, a role (who generated the message, such as `user`, `assistant`, `system`, `function`), and optional metadata. For more information, check out our [`ChatMessage` docs](../../concepts/data-classes/chatmessage.mdx).

:::note
This component is designed for chat completion, so it expects a list of messages, not a single string. If you want to use Hugging Face APIs for simple text generation (such as translation or summarization tasks) or don't want to use the `ChatMessage` object, use [`HuggingFaceAPIGenerator`](huggingfaceapigenerator.mdx) instead.

:::

The component uses a `HF_API_TOKEN` environment variable by default. Otherwise, you can pass a Hugging Face API token at initialization with `token` – see code examples below.
The token is needed:

- If you use the Serverless Inference API, or
- If you use the Inference Endpoints.

### Streaming

This Generator supports [streaming](guides-to-generators/choosing-the-right-generator.mdx#streaming-support) the tokens from the LLM directly in output. To do so, pass a function to the `streaming_callback` init parameter.

## Usage

### On its own

#### Using Serverless Inference API (Inference Providers) - Free Tier Available

This API allows you to quickly experiment with many models hosted on the Hugging Face Hub, offloading the inference to Hugging Face servers. It's rate-limited and not meant for production.

To use this API, you need a [free Hugging Face token](https://huggingface.co/settings/tokens).
The Generator expects the `model` in `api_params`. It's also recommended to specify a `provider` for better performance and reliability.

```python
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType

messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

## the api_type can be expressed using the HFGenerationAPIType enum or as a string
api_type = HFGenerationAPIType.SERVERLESS_INFERENCE_API
api_type = "serverless_inference_api" # this is equivalent to the above

generator = HuggingFaceAPIChatGenerator(api_type=api_type,
                                        api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
                                                    "provider": "together"},
                                        token=Secret.from_env_var("HF_API_TOKEN"))

result = generator.run(messages)
print(result)
```

#### Using Paid Inference Endpoints

In this case, a private instance of the model is deployed by Hugging Face, and you typically pay per hour.

To understand how to spin up an Inference Endpoint, visit [Hugging Face documentation](https://huggingface.co/inference-endpoints/dedicated).

Additionally, in this case, you need to provide your Hugging Face token.
The Generator expects the `url` of your endpoint in `api_params`.

```python
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

generator = HuggingFaceAPIChatGenerator(api_type="inference_endpoints",
                                        api_params={"url": "<your-inference-endpoint-url>"},
                                        token=Secret.from_env_var("HF_API_TOKEN"))

result = generator.run(messages)
print(result)
```

#### Using Serverless Inference API (Inference Providers) with Text+Image Input

You can also use this component with multimodal models that support both text and image input:

```python
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage, ImageContent
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType

## Create an image from file path, URL, or base64
image = ImageContent.from_file_path("path/to/your/image.jpg")

## Create a multimodal message with both text and image
messages = [ChatMessage.from_user(content_parts=["Describe this image in detail", image])]

generator = HuggingFaceAPIChatGenerator(
    api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
    api_params={
        "model": "Qwen/Qwen2.5-VL-7B-Instruct",  # Vision Language Model
        "provider": "hyperbolic"
    },
    token=Secret.from_token("<your-api-key>")
)

result = generator.run(messages)
print(result)
```

#### Using Self-Hosted Text Generation Inference (TGI)

[Hugging Face Text Generation Inference](https://github.com/huggingface/text-generation-inference) is a toolkit for efficiently deploying and serving LLMs.

While it powers the most recent versions of Serverless Inference API and Inference Endpoints, it can be used easily on-premise through Docker.

For example, you can run a TGI container as follows:

```shell
model=HuggingFaceH4/zephyr-7b-beta
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model
```

For more information, refer to the [official TGI repository](https://github.com/huggingface/text-generation-inference).

The Generator expects the `url` of your TGI instance in `api_params`.

```python
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage

messages = [ChatMessage.from_system("\\nYou are a helpful, respectful and honest assistant"),
            ChatMessage.from_user("What's Natural Language Processing?")]

generator = HuggingFaceAPIChatGenerator(api_type="text_generation_inference",
                                        api_params={"url": "http://localhost:8080"})

result = generator.run(messages)
print(result)
```

### In a pipeline

```python
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack import Pipeline
from haystack.utils import Secret
from haystack.utils.hf import HFGenerationAPIType

## no parameter init, we don't use any runtime template variables
prompt_builder = ChatPromptBuilder()
llm = HuggingFaceAPIChatGenerator(api_type=HFGenerationAPIType.SERVERLESS_INFERENCE_API,
                                  api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
                                             "provider": "together"},
                                  token=Secret.from_env_var("HF_API_TOKEN"))

pipe = Pipeline()
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("prompt_builder.prompt", "llm.messages")
location = "Berlin"
messages = [ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}")]
result = pipe.run(data={"prompt_builder": {"template_variables":{"location": location}, "template": messages}})

print(result)
```

## Additional References

🧑‍🍳 Cookbook: [Build with Google Gemma: chat and RAG](https://haystack.deepset.ai/cookbook/gemma_chat_rag)
